智能论文笔记

MultiCoNER: A Large-scale Multilingual dataset for Complex Named Entity Recognition

Shervin Malmasi , Anjie Fang , Besnik Fetahu , Sudipta Kar , Oleg Rokhlenko

分类：自然语言处理

2022-08-30

我们提出了多语言数据集的Multiconer，用于命名实体识别，涵盖11种语言的3个域（Wiki句子，问题和搜索查询），以及多语言和代码混合子集。该数据集旨在代表NER中的当代挑战，包括低文字方案（简短和未添加的文本），句法复杂的实体（例如电影标题）和长尾实体分布。使用基于启发式的句子采样，模板提取和插槽以及机器翻译等技术，从公共资源中汇编了26M令牌数据集。我们在数据集上应用了两个NER模型：一个基线XLM-Roberta模型和一个最先进的Gemnet模型，该模型利用了Gazetteers。基线实现了中等的性能（Macro-F1 = 54％），突出了我们数据的难度。 Gemnet使用Gazetteers，显着改善（Macro-F1 =+30％的平均改善）。甚至对于大型预训练的语言模型，多功能人也会构成挑战，我们认为它可以帮助进一步研究建立强大的NER系统。 Multiconer可在https://registry.opendata.aws/multiconer/上公开获取，我们希望该资源将有助于推进NER各个方面的研究。

translated by 谷歌翻译

Jamdani Motif Generation using Conditional GAN

MD Tanvir Rouf Shawon , Raihan Tanvir , Humaira Ferdous Shifa , Susmoy Kar , Mohammad Imrul Jubair

分类：计算机视觉

2022-12-22

Jamdani is the strikingly patterned textile heritage of Bangladesh. The exclusive geometric motifs woven on the fabric are the most attractive part of this craftsmanship having a remarkable influence on textile and fine art. In this paper, we have developed a technique based on the Generative Adversarial Network that can learn to generate entirely new Jamdani patterns from a collection of Jamdani motifs that we assembled, the newly formed motifs can mimic the appearance of the original designs. Users can input the skeleton of a desired pattern in terms of rough strokes and our system finalizes the input by generating the complete motif which follows the geometric structure of real Jamdani ones. To serve this purpose, we collected and preprocessed a dataset containing a large number of Jamdani motifs images from authentic sources via fieldwork and applied a state-of-the-art method called pix2pix to it. To the best of our knowledge, this dataset is currently the only available dataset of Jamdani motifs in digital format for computer vision research. Our experimental results of the pix2pix model on this dataset show satisfactory outputs of computer-generated images of Jamdani motifs and we believe that our work will open a new avenue for further research.

translated by 谷歌翻译

Corruption-tolerant Algorithms for Generalized Linear Models

Bhaskar P Mukhoty , Debojyoti Dey , Purushottam Kar

分类：机器学习 | (统计)机器学习

2022-12-11

This paper presents SVAM (Sequential Variance-Altered MLE), a unified framework for learning generalized linear models under adversarial label corruption in training data. SVAM extends to tasks such as least squares regression, logistic regression, and gamma regression, whereas many existing works on learning with label corruptions focus only on least squares regression. SVAM is based on a novel variance reduction technique that may be of independent interest and works by iteratively solving weighted MLEs over variance-altered versions of the GLM objective. SVAM offers provable model recovery guarantees superior to the state-of-the-art for robust regression even when a constant fraction of training labels are adversarially corrupted. SVAM also empirically outperforms several existing problem-specific techniques for robust regression and classification. Code for SVAM is available at https://github.com/purushottamkar/svam/

translated by 谷歌翻译

Accu-Help: A Machine Learning based Smart Healthcare Framework for Accurate Detection of Obsessive Compulsive Disorder

Kabita Patel , Ajaya Kumar Tripathy , Laxmi Narayan Padhy , Sujita Kumar Kar , Susanta Kumar Padhy , Saraju Prasad Mohanty

分类：机器学习

2022-12-05

In recent years the importance of Smart Healthcare cannot be overstated. The current work proposed to expand the state-of-art of smart healthcare in integrating solutions for Obsessive Compulsive Disorder (OCD). Identification of OCD from oxidative stress biomarkers (OSBs) using machine learning is an important development in the study of OCD. However, this process involves the collection of OCD class labels from hospitals, collection of corresponding OSBs from biochemical laboratories, integrated and labeled dataset creation, use of suitable machine learning algorithm for designing OCD prediction model, and making these prediction models available for different biochemical laboratories for OCD prediction for unlabeled OSBs. Further, from time to time, with significant growth in the volume of the dataset with labeled samples, redesigning the prediction model is required for further use. The whole process requires distributed data collection, data integration, coordination between the hospital and biochemical laboratory, dynamic machine learning OCD prediction mode design using a suitable machine learning algorithm, and making the machine learning model available for the biochemical laboratories. Keeping all these things in mind, Accu-Help a fully automated, smart, and accurate OCD detection conceptual model is proposed to help the biochemical laboratories for efficient detection of OCD from OSBs. OSBs are classified into three classes: Healthy Individual (HI), OCD Affected Individual (OAI), and Genetically Affected Individual (GAI). The main component of this proposed framework is the machine learning OCD prediction model design. In this Accu-Help, a neural network-based approach is presented with an OCD prediction accuracy of 86 percent.

translated by 谷歌翻译

A Hybrid Deep Learning Anomaly Detection Framework for Intrusion Detection

Rahul Kale , Zhi Lu , Kar Wai Fok , Vrizlynn L. L. Thing

分类：人工智能 | 机器学习

2022-12-02

Cyber intrusion attacks that compromise the users' critical and sensitive data are escalating in volume and intensity, especially with the growing connections between our daily life and the Internet. The large volume and high complexity of such intrusion attacks have impeded the effectiveness of most traditional defence techniques. While at the same time, the remarkable performance of the machine learning methods, especially deep learning, in computer vision, had garnered research interests from the cyber security community to further enhance and automate intrusion detections. However, the expensive data labeling and limitation of anomalous data make it challenging to train an intrusion detector in a fully supervised manner. Therefore, intrusion detection based on unsupervised anomaly detection is an important feature too. In this paper, we propose a three-stage deep learning anomaly detection based network intrusion attack detection framework. The framework comprises an integration of unsupervised (K-means clustering), semi-supervised (GANomaly) and supervised learning (CNN) algorithms. We then evaluated and showed the performance of our implemented framework on three benchmark datasets: NSL-KDD, CIC-IDS2018, and TON_IoT.

translated by 谷歌翻译

Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Jun Wang , Patrick Ng , Alexander Hanbo Li , Jiarong Jiang , Zhiguo Wang , Ramesh Nallapati , Bing Xiang , Sudipta Sengupta

分类：自然语言处理

2022-09-28

关于文本到SQL语义解析的最新研究取决于解析器本身或基于简单的启发式方法来理解自然语言查询（NLQ）。合成SQL查询时，没有可用的NLQ的明确语义信息，从而导致不良的概括性能。此外，如果没有词汇级的细粒度查询理解，查询与数据库之间的链接只能依赖模糊的字符串匹配，这会导致实际应用中的次优性能。考虑到这一点，在本文中，我们提出了一个基于令牌级的细粒度查询理解的通用，模块化的神经语义解析框架。我们的框架由三个模块组成：命名实体识别器（NER），神经实体接头（NEL）和神经语义解析器（NSP）。通过共同建模查询和数据库，NER模型可以分析用户意图并确定查询中的实体。 NEL模型将类型的实体链接到数据库中的模式和单元格值。解析器模型利用可用的语义信息并链接结果并根据动态生成的语法合成树结构的SQL查询。新发布的语义解析数据集的Squall实验表明，我们可以在WikiableQuestions（WTQ）测试集上实现56.8％的执行精度，这使最先进的模型的表现优于2.7％。

translated by 谷歌翻译

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

Ahmad Darkhalil , Dandan Shan , Bin Zhu , Jian Ma , Amlan Kar , Richard Higgins , Sanja Fidler , David Fouhey , Dima Damen

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-26

我们介绍了遮阳板，一个新的像素注释的新数据集和一个基准套件，用于在以自我为中心的视频中分割手和活动对象。遮阳板注释Epic-kitchens的视频，其中带有当前视频分割数据集中未遇到的新挑战。具体而言，我们需要确保像素级注释作为对象经历变革性相互作用的短期和长期一致性，例如洋葱被剥皮，切成丁和煮熟 - 我们旨在获得果皮，洋葱块，斩波板，刀，锅以及表演手的准确像素级注释。遮阳板引入了一条注释管道，以零件为ai驱动，以进行可伸缩性和质量。总共，我们公开发布257个对象类的272K手册语义面具，990万个插值密集口罩，67K手动关系，涵盖36小时的179个未修剪视频。除了注释外，我们还引入了视频对象细分，互动理解和长期推理方面的三个挑战。有关数据，代码和排行榜：http：//epic-kitchens.github.io/visor

translated by 谷歌翻译

One-Shot Federated Learning for Model Clustering and Learning in Heterogeneous Environments

Aleksandar Armacki , Dragana Bajovic , Dusan Jakovetic , Soummya Kar

分类：机器学习

2022-09-22

我们提出了一种在异质环境中联合学习的沟通有效方法。在存在$ k $不同的数据分布的情况下，系统异质性反映了，每个用户仅从$ k $分布中的一个中采样数据。所提出的方法只需要在用户和服务器之间进行一次通信，从而大大降低了通信成本。此外，提出的方法通过在样本量方面实现最佳的于点错误（MSE）率，即在异质环境中提供强大的学习保证相同的数据分布，前提是，每个用户的数据点数量高于我们从系统参数方面明确表征的阈值。值得注意的是，这是可以实现的，而无需任何了解基础分布，甚至不需要任何分布数量$ k $。数值实验说明了我们的发现并强调了所提出的方法的性能。

translated by 谷歌翻译

Consensus-based Fast and Energy-Efficient Multi-Robot Task Allocation

Prabhat Mahato , Sudipta Saha , Chayan Sarkar , Md Shaghil

分类：机器人

2022-09-21

在多机器人系统中，任务对单个机器人的适当分配是非常重要的组成部分。集中式基础架构的可用性可以保证任务的最佳分配。但是，在许多重要的情况下，例如搜索和救援，探索，灾难管理，战场等，以分散的方式将动态任务直接分配给机器人。机器人之间的有效交流在任何这样的分散环境中都起着至关重要的作用。现有的关于分布式多机器人任务分配（MRTA）的作品假设网络可用或使用幼稚的通信范例。相反，在大多数情况下，网络基础架构是不稳定的或不可用的，并且临时网络是唯一的度假胜地。在同步传输（ST）的无线通信协议（ST）的最新发展显示，比在临时网络（例如无线传感器网络（WSN）/物联网（IOT）应用程序中的传统异步传输协议（IOT）应用程序中比传统的基于异步传输的协议更有效。当前的工作是将ST用于MRTA的第一项工作。具体而言，我们提出了一种有效调整基于ST的多对多交互的算法，并将信息交换最小化以达成任务分配的共识。我们通过广泛的基于基于模拟的研究在不同的环境下进行了基于模拟的延迟和能源效率来展示拟议算法的功效。

translated by 谷歌翻译

Can GAN-induced Attribute Manipulations Impact Face Recognition?

Sudipta Banerjee , Aditi Aggarwal , Arun Ross

分类：计算机视觉

2022-09-07

由于人口统计因素（例如年龄，性别，种族等）的影响，已经在自动化的面部识别系统中进行了广泛的研究。但是，\ textIt {数字修改}的人口统计学和面部属性对面部识别的影响相对较小。在这项工作中，我们研究了通过生成对抗网络（GAN）引起的属性操作的影响对面部识别性能。我们通过使用Attgan和Stgan有意修改13个属性，并评估它们对两种基于深度学习的面部验证方法，Arcface和VGGFACE的影响，在Celeba数据集上进行实验。我们的发现表明，涉及眼镜和性线索的数字变化的一些属性操纵可能会大大损害面部识别多达73％，需要进一步分析。

translated by 谷歌翻译